Table Extraction Using Spatial Reasoning on the CSS2 Visual Box Model
نویسندگان
چکیده
Tables on web pages contain a huge amount of semantically explicit information, which makes them a worthwhile target for automatic information extraction and knowledge acquisition from the Web. However, the task of table extraction from web pages is difficult, because of HTML’s design purpose to convey visual instead of semantic information. In this paper, we propose a robust technique for table extraction from arbitrary web pages. This technique relies upon the positional information of visualized DOM element nodes in a browser and, hereby, separates the intricacies of code implementation from the actual intended visual appearance. The novel aspect of the proposed web table extraction technique is the effective use of spatial reasoning on the CSS2 visual box model, which shows a high level of robustness even without any form of learning (F-measure ≈ 90%). We describe the ideas behind our approach, the tabular pattern recognition algorithm operating on a double topographical grid structure and allowing for effective and robust extraction, and general observations on web tables that should be borne in mind by any automatic web table extraction mechanism.
منابع مشابه
THE EFFECT OF POSTERIOR CEREBRAL PULMONARY DIRECT ELECTRICAL STIMULATION (TDCS) ON IMPROVING SPATIAL, VISUAL, AND VERBAL PERCEPTUAL ABILITIES
Background & Aims: Direct electrical stimulation of the brain is a therapeutic technique that can be effective in improving visual, verbal, and spatial perception. The present study investigated the effect of direct electrical stimulation (tDCS) of the posterior parietal cortex on improving spatial, visual, and verbal perceptual abilities. Materials & Methods: In this quasi-experimental study,...
متن کاملA Constraint-based Speciication for Box Layout in Css2
Cascading Style Sheets provide a exible mechanism for governing the appearance of Web pages. Cascading Style Sheets Level 2 (CSS2) are an enhancement to the original CSS1 speciication, giving Web page designers additional control over the appearance of Web pages. However, the CSS2 speciication is written in English, leaving open the possibility of ambiguity or inconsistency. We present a formal...
متن کاملA Constraint-Based Speci cation for Box Layout in CSS2
Cascading Style Sheets provide a exible mechanism for governing the appearance of Web pages. Cascading Style Sheets Level 2 (CSS2) are an enhancement to the original CSS1 speci cation, giving Web page designers additional control over the appearance of Web pages. However, the CSS2 speci cation is written in English, leaving open the possibility of ambiguity or inconsistency. We present a formal...
متن کاملبررسی کنشهای شناختی دانشآموزان دارای لکنت
Objective Stuttering is one of the most common speech disorders that generate many complications in children and adults. This disorder involves behavioral, cognitive and emotional interactions. So, the purpose of the current study is to investigate the cognitive functions of students with stuttering. Materials & Methods A descriptive study, comprising of 30 students (8 females and 22 males) fr...
متن کاملVisual routines and attention
The human visual system solves an amazing range of problems in the course of everyday activities. Without conscious effort, the human visual system finds a place on the table to put down a cup, selects the shortest checkout queue in a grocery store, looks for moving vehicles before we cross a road, and checks to see if the stoplight has turned green. Inspired by the human visual system, I have ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006